Modeling Cross-morpheme Pro for Korean Large Vocabulary Cont

نویسنده

  • Kyong-Nim Lee
چکیده

In this paper, we describe a cross-morpheme pronunciation variation model which is especially useful for constructing morpheme-based pronunciation lexicon for Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation variations, we have distinguished pronunciation variation rules according to the locations such as within a morpheme, across a morpheme boundary in a compound noun, across a morpheme boundary in an eojeol, and across an eojeol boundary. In 33K-morpheme Korean CSR experiment, an absolute improvement of 1.16% in WER from the baseline performance of 23.17% WER is achieved by modeling crossmorpheme pronunciation variations with a context-dependent multiple pronunciation lexicon.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pronunciation lexicon modeling and design for Korean large vocabulary continuous speech recognition

In this paper, we describe a pronunciation lexicon model which is especially useful for constructing morpheme-based pronunciation lexicon to improve the performance of a Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. For modeling of cross-morpheme pronunciation variations, we usually used a context-dependent multiple pronunciatio...

متن کامل

Korean large vocabulary continuous speech recognition with morpheme-based recognition units

In Korean writing, a space is placed between two adjacent word-phrases, each of which generally corresponds to two or three words in English in a semantic sense. If the word-phrase is used as a recognition unit for Korean large vocabulary continuous speech recognition (LVCSR), the out-of-vocabulary (OOV) rate becomes very large. If a morpheme or a syllable is used instead, a severe inter-morphe...

متن کامل

Korean large vocabulary continuous speech recognition using pseudomorpheme units

This paper presents a Korean large vocabulary continuous speech recognition system based on pseudomorpheme units. In Korean, an eojeol (word phrase) is a unit for spacing and a morpheme is the smallest unit with semantic meaning. If the eojeol is used as the dictionary and language modeling unit, the number of the unit becomes enormous. Instead we propose to use modified morpheme or pseudomorph...

متن کامل

Large vocabulary continuous speech recognition based on cross-morpheme phonetic information

In this paper, we present a novel method to regulate lexical connections among morpheme-based pronunciation lexicons for Korean large vocabulary continuous speech recognition (LVCSR) systems. A pronunciation dictionary plays an important role in subword-based LVCSR in that pronunciation variations such as coarticulation will deteriorate the performance of an LVCSR system if it is not well accou...

متن کامل

Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS

This paper describes a grapheme-to-phoneme conversion method using phoneme connectivity and CCV conversion rules. The method consists of mainly four modules including morpheme normalization, phrase-break detection, morpheme to phoneme conversion and phoneme connectivity check. The morpheme normalization is to replace non-Korean symbols into standard Korean graphemes. The phrase-break detector a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003